16 research outputs found
Blending Generative Adversarial Image Synthesis with Rendering for Computer Graphics
Conventional computer graphics pipelines require detailed 3D models, meshes,
textures, and rendering engines to generate 2D images from 3D scenes. These
processes are labor-intensive. We introduce Hybrid Neural Computer Graphics
(HNCG) as an alternative. The contribution is a novel image formation strategy
to reduce the 3D model and texture complexity of computer graphics pipelines.
Our main idea is straightforward: Given a 3D scene, render only important
objects of interest and use generative adversarial processes for synthesizing
the rest of the image. To this end, we propose a novel image formation strategy
to form 2D semantic images from 3D scenery consisting of simple object models
without textures. These semantic images are then converted into photo-realistic
RGB images with a state-of-the-art conditional Generative Adversarial Network
(cGAN) based image synthesizer trained on real-world data. Meanwhile, objects
of interest are rendered using a physics-based graphics engine. This is
necessary as we want to have full control over the appearance of objects of
interest. Finally, the partially-rendered and cGAN synthesized images are
blended with a blending GAN. We show that the proposed framework outperforms
conventional rendering with ablation and comparison studies. Semantic retention
and Fr\'echet Inception Distance (FID) measurements were used as the main
performance metrics
Vision Language Models in Autonomous Driving and Intelligent Transportation Systems
The applications of Vision-Language Models (VLMs) in the fields of Autonomous
Driving (AD) and Intelligent Transportation Systems (ITS) have attracted
widespread attention due to their outstanding performance and the ability to
leverage Large Language Models (LLMs). By integrating language data, the
vehicles, and transportation systems are able to deeply understand real-world
environments, improving driving safety and efficiency. In this work, we present
a comprehensive survey of the advances in language models in this domain,
encompassing current models and datasets. Additionally, we explore the
potential applications and emerging research directions. Finally, we thoroughly
discuss the challenges and research gap. The paper aims to provide researchers
with the current work and future trends of VLMs in AD and ITS
3D Understanding of Deformable Linear Objects: Datasets and Transferability Benchmark
Deformable linear objects are vastly represented in our everyday lives. It is
often challenging even for humans to visually understand them, as the same
object can be entangled so that it appears completely different. Examples of
deformable linear objects include blood vessels and wiring harnesses, vital to
the functioning of their corresponding systems, such as the human body and a
vehicle. However, no point cloud datasets exist for studying 3D deformable
linear objects. Therefore, we are introducing two point cloud datasets,
PointWire and PointVessel. We evaluated state-of-the-art methods on the
proposed large-scale 3D deformable linear object benchmarks. Finally, we
analyzed the generalization capabilities of these methods by conducting
transferability experiments on the PointWire and PointVessel datasets
Risky Action Recognition in Lane Change Video Clips using Deep Spatiotemporal Networks with Segmentation Mask Transfer
Advanced driver assistance and automated driving systems rely on risk
estimation modules to predict and avoid dangerous situations. Current methods
use expensive sensor setups and complex processing pipeline, limiting their
availability and robustness. To address these issues, we introduce a novel deep
learning based action recognition framework for classifying dangerous lane
change behavior in short video clips captured by a monocular camera. We
designed a deep spatiotemporal classification network that uses pre-trained
state-of-the-art instance segmentation network Mask R-CNN as its spatial
feature extractor for this task. The Long-Short Term Memory (LSTM) and
shallower final classification layers of the proposed method were trained on a
semi-naturalistic lane change dataset with annotated risk labels. A
comprehensive comparison of state-of-the-art feature extractors was carried out
to find the best network layout and training strategy. The best result, with a
0.937 AUC score, was obtained with the proposed network. Our code and trained
models are available open-source.Comment: 8 pages, 3 figures, 1 table. The code is open-sourc